Homework 05: Contours and Gradients #1: We know that quadratics are useful because smooth objective functions look like quadratics if you zoom in just right. Consider the objective function for Ordinary Least Squares (OLS) regression: sum of the squared residuals, where residual = y data value - (m*x+b) Is this function (sum of squared residuals) exactly quadratic as a function of m & b ? Show symbolically that this objective function is or is not exactly quadratic, using symbolic arguments (algebra) rather than numeric investigation. #2: Re-do the contour plot using the sum of absolute values, rather than the sum of squares. You might need to adjust: * the resolution of the grid we compute, or * the # or value of contours, or * the range of values that we scan, to get a pretty picture. You may use the original xdata and ydata, or the xdata and ydata after subtracting the mean of each. Optionally, you could try doing both. If you don't want to use Python or Matlab/Octave, you may use whatever other software suits you. You can do it with a 2-dimensional "What If" table in Excel--ask if you're curious. But Excel's contour graphs look rather yucky (to me). It's even possible to do it in WolframAlpha, though you'd have to enter each residual computation individually. Desmos.com is also a good option, but again you might need to enter each residual computation individually. Hint: abs(stuff) computes the absolute value of each element in a vector or matrix named "stuff", giving you back a vector or matrix of the same size. For example, in Matlab, abs([3 4 -5 0 -10]) = [3 4 5 0 10] or in Python, use np.abs(whatever) Copy/paste the contour plots into a Word file or the equivalent. Comment on how much the contours are like ellipsoids. #3: Re-do the contour plot using the maximum absolute residual rather than the sum of squares or absolute values. Hint: use max() instead of sum(). If you're using python, you might need np.max instead of plain max. Again, you might need to adjust the resolution of our grid or the contours you choose. FYI, minimizing the maximum absolute residual is called "Chebyshev" regression. It is usually a bad idea--it is very susceptible to outliers. Copy/paste the contour plots into a Word or Excel file or the equivalent. Comment on how much the contours are like ellipsoids. #4: gradients of Ordinary Least Squares Consider the objective function: sum (y_i - (m*x_i + b))^2 a) compute its gradient symbolically, by hand. b) change the appropriate function (Matlab or Python) to include a computation of the gradient in the Ordinary Least Squares case. In Matlab, it will start like this: function [objfuncval,gr] = objfunc(myslope,myintercept,datamat) and then the last line will be something like gr = (whatever formula will compute the gradient) If you want to compute the gradient component-by-component, that is, one formula for the partial with respect to myslope and another for the partial with respect to myintercept, you can do gr(1) = (formula for partial w.r.t. slope) gr(2) = (formula for partial w.r.t. intercept) Note that you don't have to actually run it in Matlab/Octave if you're struggling with them. Just type formulas that they would be able to interpret. Similarly, with python: if you're having trouble getting it running on your machine, you can still get full credit on this problem by just doing the coding correctly. Copy/paste the resulting code into your homework file. #5 (from Winston, pg 628) The energy used in compressing a gas in 3 stages from an initial pressure "I" to a final pressure "F" is given by the formula K*(sqrt(p1/I) + sqrt(p2/p1) + sqrt(F/p2) - 3) where K is some arbitrary positive constant. Let's use K=7. Formulate an NLP whose solution describes how to minimize the energy used in compressing the gas. a) Using the values I=1 (atmosphere) and F=10, solve the NLP numerically (using Excel Solver or equivalent); attach your Solver file along with your Word file when you submit the homework. b) Based on the optimal p1 and p2 values you got, write down a formula for the optimal p1 and p2 as a function of I and F. This involves some investigation and guessing: try taking a look at p1/I, p2/p1, and F/p2. c) Prove that your conjecture is indeed optimal (take the gradient, etc.) Note: I am asking you to make a guess, then prove the guess is optimal by showing it satisfies the appropriate equation. You could go the other way: take the gradient, and try to solve the equation: gradient=zerovector. I think my way is easier. d) Find the Hessian at the optimal point in part (a); your answer should be a matrix with numbers in it, rather than formulas. You could do this by finite differences, or by taking the Hessian symbolically then plugging in values. Reminder: while some textbooks define the Hessian as the determinant of the matrix of 2nd derivatives, in optimization class the Hessian is that matrix itself, not just its determinant or one element. #6: (optional) re-do #2 and #3 using a different set of (x,y) data that do not fall perfectly on a straight line. For example, x y 1 1 1 3 5 5 5 7 Do there seem to be multiple optima? You might also try doing contour of the original least-squares objective function. #7: (optional) a Control Theory problem Implement Example 4 (Control) in Chapter 7.3 in the Luenberger textbook in Excel Solver or equivalent. Also try putting different weights on the x variables vs. the u variables. I'm also wondering: is the optimal solution a Proportional control? That is, is u = (some constant)*x at each step? Or is there some other simple rule like that? #8: (optional) an Approximation problem Problem 1 in section 7.10 in the Luenberger textbook: how to approximate a function with a polynomial ----------------------------------------------------- Here are some of the Desmos graphs that I showed in class: Choosing Delta-x in Finite Differences: https://www.desmos.com/calculator/2uve4ik24x Newton Root, in 1 dimension: https://www.desmos.com/calculator/owomfhwz3o Newton Optimizing, in 1 dimension: https://www.desmos.com/calculator/2nlhb7yjf7 or another version at https://www.desmos.com/calculator/mmcbljru5l Gradient and Taylor Series Contour/Newton Direction in 2 dimensions: https://www.desmos.com/calculator/mgkpzuybef ------------------------------------------------------ Here is a very interesting presentation that talks about optimization and machine learning: http://pages.cs.wisc.edu/~swright/wright-kdd-2013.pdf Also interesting: http://mrtz.org/blog/the-zen-of-gradient-descent/ and http://sebastianruder.com/optimizing-gradient-descent/ An overview of gradient descent optimization algorithms Also, below I'm including the list of ways that Matlab decides when to stop when optimizing. % Here are the ways Matlab decides to stop: % 1 Magnitude of gradient smaller than the specified tolerance. % 2 Change in X smaller than the specified tolerance. % 3 Change in the objective function value smaller than the specified % tolerance (only occurs in the large-scale method). % 0 Maximum number of function evaluations or iterations reached. % -1 Algorithm terminated by the output function. % -2 Line search cannot find an acceptable point along the current % search direction (only occurs in the medium-scale method).